a tiny vision language model that kicks ass and runs anywhere
Moondream is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint. It's designed to be versatile and accessible, capable of running on a wide range of devices and platforms.
The project offers two model variants:
- Moondream 2B: The primary model with 2 billion parameters, offering robust performance for general-purpose image understanding tasks including captioning, visual question answering, and object detection.
- Moondream 0.5B: A compact 500 million parameter model specifically optimized as a distillation target for edge devices, enabling efficient deployment on resource-constrained hardware while maintaining impressive capabilities.
Moondream can be run locally, or in the cloud. Please refer to the Getting Started page for details.
- Modal - Modal lets you run jobs in the cloud, by just writing a few lines of Python. Here's an example of how to run Moondream on Modal.